148 research outputs found

    GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

    Full text link
    From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

    [Demo] Low-latency spark queries on updatable data

    Get PDF
    As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing

    Prevalence of childhood asthma and its immediate outcome - At tertiary care rural hospital

    Get PDF
    Introduction: Asthma is a chronic inflammatory disorder of the airways resulting in episodic airway obstruction. Globally, childhood asthma is increasing in the prevalence, despite improvements in investigation and treatment. Childhood asthma seemed more prevalent in urban population and now even in rural areas of India. Objectives: To know the prevalence, assess the risk factors, severity, and immediate outcome of the treatment offered to asthmatic children in a tertiary rural hospital. Materials and Methods: All the diagnosed asthmatic children up to 18 years were enrolled in the study. All the patients of pulmonary Koch’s, congenital heart disease and chronic lung disease were excluded from the study. Clinical profile was noted in recruited patients. Results: The prevalence of childhood asthma among children visiting to our department was 3.93%. 58 (48.33%) had age of onset before the age of 6 years. Asthma was more prevalent in boys. 116 (96.66%) children presented with complain of cough, and 118 (98.33%) children had associated breathlessness. Common precipitating factors were change in season (71.66%), pollen allergy (58.33%), air pollutieon (45.00%), and passive smoking (23.33%). Exercise-induced asthma was seen in 55% cases, diurnal variation in 60% and 28.33% children had family history of atopic disease. Majority of the patient was undernourished. The average duration of stay in persistent asthma is 1.8 times more than in intermittent asthma. Conclusion: Significant number of patient becomes symptomatic before the 6 years of age. Prevention of child from exposure to passive smoking, environmental improvement, and allergen avoidance are major aspects for prevention of asthma exacerbations

    In-Memory Indexed Caching for Distributed Data Processing

    Get PDF
    Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. The de-facto distributed data processing framework, Apache Spark, is poorly suited for the modern cloud-based data-science workloads due to its outdated assumptions: static datasets analyzed using coarse-grained transformations. In this paper, we introduce the Indexed DataFrame, an in-memory cache that supports a dataframe abstraction which incorporates indexing capabilities to support fast lookup and join operations. Moreover, it supports appends with multi-version concurrency control. We implement the Indexed DataFrame as a lightweight, standalone library which can be integrated with minimum effort in existing Spark programs. We analyze the performance of the Indexed DataFrame in cluster and cloud deployments with real-world datasets and benchmarks using both Apache Spark and Databricks Runtime. In our evaluation, we show that the Indexed DataFrame significantly speeds-up query execution when compared to a non-indexed dataframe, incurring modest memory overhead

    Representativeness of Eddy-Covariance flux footprints for areas surrounding AmeriFlux sites

    Get PDF
    Large datasets of greenhouse gas and energy surface-atmosphere fluxes measured with the eddy-covariance technique (e.g., FLUXNET2015, AmeriFlux BASE) are widely used to benchmark models and remote-sensing products. This study addresses one of the major challenges facing model-data integration: To what spatial extent do flux measurements taken at individual eddy-covariance sites reflect model- or satellite-based grid cells? We evaluate flux footprints—the temporally dynamic source areas that contribute to measured fluxes—and the representativeness of these footprints for target areas (e.g., within 250–3000 m radii around flux towers) that are often used in flux-data synthesis and modeling studies. We examine the land-cover composition and vegetation characteristics, represented here by the Enhanced Vegetation Index (EVI), in the flux footprints and target areas across 214 AmeriFlux sites, and evaluate potential biases as a consequence of the footprint-to-target-area mismatch. Monthly 80% footprint climatologies vary across sites and through time ranging four orders of magnitude from 103 to 107 m2 due to the measurement heights, underlying vegetation- and ground-surface characteristics, wind directions, and turbulent state of the atmosphere. Few eddy-covariance sites are located in a truly homogeneous landscape. Thus, the common model-data integration approaches that use a fixed-extent target area across sites introduce biases on the order of 4%–20% for EVI and 6%–20% for the dominant land cover percentage. These biases are site-specific functions of measurement heights, target area extents, and land-surface characteristics. We advocate that flux datasets need to be used with footprint awareness, especially in research and applications that benchmark against models and data products with explicit spatial information. We propose a simple representativeness index based on our evaluations that can be used as a guide to identify site-periods suitable for specific applications and to provide general guidance for data use

    ECOSTRESS: NASA's next generation mission to measure evapotranspiration from the International Space Station

    Get PDF
    The ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station ECOSTRESS) was launched to the International Space Station on June 29, 2018. The primary science focus of ECOSTRESS is centered on evapotranspiration (ET), which is produced as level‐3 (L3) latent heat flux (LE) data products. These data are generated from the level‐2 land surface temperature and emissivity product (L2_LSTE), in conjunction with ancillary surface and atmospheric data. Here, we provide the first validation (Stage 1, preliminary) of the global ECOSTRESS clear‐sky ET product (L3_ET_PT‐JPL, version 6.0) against LE measurements at 82 eddy covariance sites around the world. Overall, the ECOSTRESS ET product performs well against the site measurements (clear‐sky instantaneous/time of overpass: r2 = 0.88; overall bias = 8%; normalized RMSE = 6%). ET uncertainty was generally consistent across climate zones, biome types, and times of day (ECOSTRESS samples the diurnal cycle), though temperate sites are over‐represented. The 70 m high spatial resolution of ECOSTRESS improved correlations by 85%, and RMSE by 62%, relative to 1 km pixels. This paper serves as a reference for the ECOSTRESS L3 ET accuracy and Stage 1 validation status for subsequent science that follows using these data

    In-situ estimation of ice crystal properties at the South Pole using LED calibration data from the IceCube Neutrino Observatory

    Get PDF
    The IceCube Neutrino Observatory instruments about 1 km3 of deep, glacial ice at the geographic South Pole using 5160 photomultipliers to detect Cherenkov light emitted by charged relativistic particles. A unexpected light propagation effect observed by the experiment is an anisotropic attenuation, which is aligned with the local flow direction of the ice. Birefringent light propagation has been examined as a possible explanation for this effect. The predictions of a first-principles birefringence model developed for this purpose, in particular curved light trajectories resulting from asymmetric diffusion, provide a qualitatively good match to the main features of the data. This in turn allows us to deduce ice crystal properties. Since the wavelength of the detected light is short compared to the crystal size, these crystal properties do not only include the crystal orientation fabric, but also the average crystal size and shape, as a function of depth. By adding small empirical corrections to this first-principles model, a quantitatively accurate description of the optical properties of the IceCube glacial ice is obtained. In this paper, we present the experimental signature of ice optical anisotropy observed in IceCube LED calibration data, the theory and parametrization of the birefringence effect, the fitting procedures of these parameterizations to experimental data as well as the inferred crystal properties.</p

    Observation of Cosmic Ray Anisotropy with Nine Years of IceCube Data

    Get PDF

    Searching for time-dependent high-energy neutrino emission from X-ray binaries with IceCube

    Get PDF

    A time-independent search for neutrinos from galaxy clusters with IceCube

    Get PDF
    • 

    corecore